88 research outputs found
SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation
We present SimLex-999, a gold standard resource for evaluating distributional
semantic models that improves on existing resources in several important ways.
First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly
quantifies similarity rather than association or relatedness, so that pairs of
entities that are associated but not actually similar [Freud, psychology] have
a low rating. We show that, via this focus on similarity, SimLex-999
incentivizes the development of models with a different, and arguably wider
range of applications than those which reflect conceptual association. Second,
SimLex-999 contains a range of concrete and abstract adjective, noun and verb
pairs, together with an independent rating of concreteness and (free)
association strength for each pair. This diversity enables fine-grained
analyses of the performance of models on concepts of different types, and
consequently greater insight into how architectures can be improved. Further,
unlike existing gold standard evaluations, for which automatic approaches have
reached or surpassed the inter-annotator agreement ceiling, state-of-the-art
models perform well below this ceiling on SimLex-999. There is therefore plenty
of scope for SimLex-999 to quantify future improvements to distributional
semantic models, guiding the development of the next generation of
representation-learning architectures
Reconstructing Native Language Typology from Foreign Language Usage
Linguists and psychologists have long been studying cross-linguistic
transfer, the influence of native language properties on linguistic performance
in a foreign language. In this work we provide empirical evidence for this
process in the form of a strong correlation between language similarities
derived from structural features in English as Second Language (ESL) texts and
equivalent similarities obtained from the typological features of the native
languages. We leverage this finding to recover native language typological
similarity structure directly from ESL text, and perform prediction of
typological features in an unsupervised fashion with respect to the target
languages. Our method achieves 72.2% accuracy on the typology prediction task,
a result that is highly competitive with equivalent methods that rely on
typological resources.Comment: CoNLL 201
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa
Multi-task Active Learning for Pre-trained Transformer-based Models
Multi-task learning, in which several tasks are jointly learned by a single
model, allows NLP models to share information from multiple annotations and may
facilitate better predictions when the tasks are inter-related. This technique,
however, requires annotating the same text with multiple annotation schemes
which may be costly and laborious. Active learning (AL) has been demonstrated
to optimize annotation processes by iteratively selecting unlabeled examples
whose annotation is most valuable for the NLP model. Yet, multi-task active
learning (MT-AL) has not been applied to state-of-the-art pre-trained
Transformer-based NLP models. This paper aims to close this gap. We explore
various multi-task selection criteria in three realistic multi-task scenarios,
reflecting different relations between the participating tasks, and demonstrate
the effectiveness of multi-task compared to single-task selection. Our results
suggest that MT-AL can be effectively used in order to minimize annotation
efforts for multi-task NLP models.Comment: Accepted for publication in Transactions of the Association for
Computational Linguistics (TACL), 2022. Pre-MIT Press publication versio
Zero-Shot Semantic Parsing for Instructions
We consider a zero-shot semantic parsing task: parsing instructions into
compositional logical forms, in domains that were not seen during training. We
present a new dataset with 1,390 examples from 7 application domains (e.g. a
calendar or a file manager), each example consisting of a triplet: (a) the
application's initial state, (b) an instruction, to be carried out in the
context of that state, and (c) the state of the application after carrying out
the instruction. We introduce a new training algorithm that aims to train a
semantic parser on examples from a set of source domains, so that it can
effectively parse instructions from an unknown target domain. We integrate our
algorithm into the floating parser of Pasupat and Liang (2015), and further
augment the parser with features and a logical form candidate filtering logic,
to support zero-shot adaptation. Our experiments with various zero-shot
adaptation setups demonstrate substantial performance gains over a non-adapted
parser.Comment: ACL 201
- …